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Abstract — There  are  many  well-known  sources  of  nonlinearity 
present  in  hyperspectral  imagery;  these  include  bi-directional  re¬ 
flectance  distribution  function  (BRDF)  effects,  multi-path  scatter 
between  heterogeneous  pixel  constituents,  and  the  variable  pres¬ 
ence  of  water,  an  attenuating  medium,  in  the  scene.  In  recent  pub¬ 
lications,  we  have  presented  a  data-driven  approach  to  represent¬ 
ing  the  nonlinear  structure  of  hyperspectral  imagery  [4].  The  ap¬ 
proach  relies  on  graph  methods  to  derive  geodesic  distances  on  the 
high-dimensional  hyperspectral  data  manifold.  From  these  dis¬ 
tances,  a  set  of  manifold  coordinates  that  parameterizes  the  data 
manifold  is  derived.  Because  of  the  computational  and  memory 
overhead  required  in  the  geodesic  coordinate  calculations,  the  ap¬ 
proach  relies  on  partitioning  the  scene  into  subsets  where  the  op¬ 
timal  manifold  coordinates  can  be  derived  in  an  efficient  manner, 
followed  by  an  alignment  stage  during  which  the  embedded  man¬ 
ifold  coordinates  for  each  subset  are  aligned  to  a  common  man¬ 
ifold  coordinate  system.  In  [4],  we  demonstrated  the  feasibility 
of  the  coordinate  and  alignment  methodology  and  the  ability  of 
the  manifold  approach  to  provide  higher  data  compression  and 
more  effective  classification  when  compared  with  linear  methods. 
In  this  paper  we  develop  an  improved  approach  to  the  manifold 
coordinate  alignment  phase  with  an  improved  sampling  method¬ 
ology.  Results  are  demonstrated  using  examples  of  hyperspectral 
imagery  derived  from  PROBE2  hyperspectral  scenes  of  the  Vir¬ 
ginia  Coast  Reserve  barrier  islands. 


I.  Introduction  and  Background 
A.  Nonlinearity  in  Hyperspectral  Imagery 

Nonlinearity  in  hyperspectral  imagery  is  a  significant  source 
of  estimation  errors  in  derived  products.  Sources  of  non¬ 
linearity  include:  (1)  nonlinear  variations  in  reflectance  pro¬ 
duced  by  variations  in  sun-canopy-sensor  geometry  in  the  land¬ 
scape  [15]  [9],  (2)  multi-path  scatter  among  sub-pixel  con¬ 
stituents  [12]  [14],  violating  the  traditional  linear  mixing  as¬ 
sumptions,  (3)  the  variable  presence  of  water,  an  attenuating 
medium  [1 1]  in  the  scene.  Some  of  the  errors  that  we  observed 
in  mapping  products  that  we  previously  derived  in  [2]  [3]  be¬ 
came  the  motivation  for  finding  new  methods  of  modeling  non¬ 
linear  structure  in  hyperspectral  data  [4].  In  the  next  two  sub¬ 
sections,  we  give  a  brief  overview  of  the  approach  that  we  pre¬ 
sented  in  [4]  as  a  preamble  to  introducing  improvements. 
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B.  Manifold  Coordinate  Representations 

In  [4],  we  described  a  new  method  for  modeling  nonlinear 
effects  in  hyperspectral  imagery  and  demonstrated  that  it  pro¬ 
vided  a  better  means  of  discriminating  land-cover  types  with 
a  high-degree  of  spectral  similarity.  Using  examples  from 
AVIRIS  and  PROBE2  imagery,  we  also  showed  that  our  new 
approach  provides  better  compression  of  HSI  data  in  both  ter¬ 
restrial  and  aquatic  imagery.  The  new  method  involves  a  data- 
driven  estimation  of  a  set  of  coordinates  that  parameterizes  the 
high-dimensional  hyperspectral  data  manifold.  The  method 
proceeds  by  calculating  the  local  spectral  neighborhood  dis¬ 
tances  where  linearity  is  assumed  to  hold  about  each  sample 
and  then  determining  the  shortest  nonlinear  path  (geodesic)  dis¬ 
tances  to  all  other  spectral  samples  outside  the  spectral  neigh¬ 
borhood.  These  distances  are  then  used  to  derive  the  manifold 
coordinate  system  that  parameterizes  the  high-dimensional  hy¬ 
perspectral  data  manifold.  In  [4],  we  also  described  methods 
for  achieving  computationally  scalable  implementations  of  this 
approach.  In  Figure  1,  we  provide  a  conceptual  representation 
of  manifold  coordinate  estimation;  note  that  the  manifold  coor¬ 
dinate  system  parameterizes  the  high-dimensional  (124  chan¬ 
nels  in  this  example)  HSI  data  manifold,  so  that  linear  distance 
in  the  coordinates  corresponds  to  a  nonlinear  distance  over  the 
surface  of  the  original  higher-dimensional  data.  In  Section  IV, 
examples  of  this  processing  applied  to  PROBE2  hyperspectral 
imagery  from  our  Virginia  Coast  Reserve  barrier  islands  study 
site  are  provided. 

The  fundamental  computational  steps  are:  partition  the  scene 
into  a  set  of  computationally  tractable  “tiles”,  then  the  compu¬ 
tation  of  a  low-dimensional  set  of  manifold  coordinates  using 
the  Isometric  Mapping  (ISOMAP)  [18]  algorithm,  and  finally 
a  manifold  alignment  stage  using  a  reconstruction  algorithm 
in  which  coordinate  transformations  are  derived  between  the 
manifold  coordinates  of  the  tiles  [4].  Note  that  the  definition 
of  “tractable”  was  addressed  in  [4],  but  improved  scaling  pre¬ 
sented  in  Section  II  expands  the  scale  of  what  is  considered  a 
computationally  feasible  tile  size.  The  ISOMAP  portion  of  the 
computations  involves  the  following  steps:  (1)  given  a  speci¬ 
fied  metric  such  as  Euclidean,  spectral  angle,  or  some  other  ap¬ 
propriate  choice,  determine  the  spectral  neighborhoods  (initial 
sparse  neighborhood  graph)  where  linearity  holds,  maintaining 
a  list  for  each  sample  of  its  neighbors  and  metric  distances;  (2) 
at  each  sample,  for  all  distances  outside  the  neighborhood,  use 
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Fig.  1.  Conceptual  view  of  manifold  coordinate  system.  (Left)  PROBE2  im¬ 
age  subset,  showing  the  source  data  of  124  spectral  channels.  (Center)  RGB 
triplet  shows  a  false  color  image  of  the  scene  derived  from  wavelengths  at 
0.63,  1.29,  and  2.22  /_ im  for  the  subset  over  Smith  Island,  VA,  October  18, 
2001;  shown:  uplands,  brackish  and  fresh  water  marshes,  dunes,  beach,  and 
surf  zone.  (Right,  top)  Corresponding  scatterplot  of  the  reflectance  of  these  ar¬ 
bitrarily  chosen  channels,  reveal  a  highly  nonlinear  HSI  data  manifold.  (Right, 
bottom)  Manifold  coordinate  system  parameterizes  the  HSI  spectral  data  (note 
that  the  coordinates  are  a  parameterization  of  the  full  spectral  data,  not  just  the 
three  arbitrarily  displayed  channels). 


Dijkstra’s  algorithm  [7]  [16]  with  a  minimum  priority  queue  to 
relax  the  closest  edge  not  already  attached  to  the  graph  do  to 
compute  the  shortest  nonlinear  path  (geodesic)  distance  to  all 
other  samples  (note  that  this  is  a  graph  calculation  and  that  the 
metric,  therefore,  is  not  involved  here  but  is  only  evaluated  in 
step  (1)  inside  the  neighborhood);  (3)  if  there  are  any  remain¬ 
ing  distances  which  can  not  be  connected  in  the  distance  graph, 
attach  pockets  of  isolated  points  to  each  other  in  the  graph  by 
find  the  closest  linear  distance  between  pairs  of  isolated  pock¬ 
ets,  thus  preserving  the  geodesic  structure  of  each  and  ensuring 
a  minimal  spanning  tree  [4]  1 ;  after  attaching  all  points  sym¬ 
metrize  to  ensure  consistency  of  paths  in  both  directions;  (4) 
with  the  full  NxN  (N=number  of  spectral  samples)  geodesic 
distance  matrix  calculated  in  steps  (1)  and  (2),  compute  the  sec¬ 
ond  order  variation  in  the  geodesic  distances,  r  =  —  \HT SH, 
where  Sij  =  (( dc)ij )2  and  is  a  centering  ma¬ 

trix  ;  and  (5)  extract  s  (s  «  N)  manifold  coordinates  from  the 
most  significant  eigenvectors  and  eigenvalues  of  the  NxN  ma¬ 
trix  r  with  the  ith  manifold  coordinate  given  by  Mi  =  y/Xivi. 


C.  Scalable  Algorithms 

In  [4],  we  addressed  the  computational  and  memory  scaling 
issues  associated  with  manifold  coordinate  calculations  at  re¬ 
mote  sensing  scales  where  we  may  typically  want  to  process 
O(106  —  107)  pixels  in  a  single  hyperspectral  scene.  One  of  the 
principal  limiting  factors  was  memory  which  scaled  as  0(N2) 
because  of  the  need  to  store  do-  Computationally,  the  Dijkstra 
alogrithm  with  a  minimum  priority  queue  implementation  [16] 

^ote  that  this  can  be  accomplished  by  iteratively  attaching  the  closest 
unattached  sample  to  the  graph  and  then  running  the  Dijkstra  algorithm  on  the 
first  row  of  do  until  all  points  have  a  path  to  the  first  point;  this  ensures  efficient 
scaling  of  0(ctNlog(N))  with  a  <<  N  in  most  cases 


ensures  that  the  graph  calculation  scales  as  0(N2log(N)). 
Because  of  the  computational  and  memory  requirements,  we 
developed  a  scaling  strategy  [4]  in  which  large  hyperspectral 
scenes  are  divided  into  a  computationally  tractable  set  of  data 
blocks  or  “tiles”  for  which  manifold  coordinates  can  be  opti¬ 
mally  computed,  followed  by  an  alignment  phase  during  which 
the  embedded  manifold  coordinates  for  each  tile  subset  are 
aligned  to  a  common  manifold  coordinate  system. 

In  [4],  several  strategies  for  alignment  were  proposed.  These 
included:  (a)  splicing  a  set  of  common  samples  onto  each  tile 
which  could  serve  as  guide-posts  for  manifold  alignment,  (b) 
partitioning  the  scene  into  tiles  by  random  or  active  sampling 
followed  by  an  alignment  stage,  and  finally  (c)  a  direct  recon¬ 
struction  technique,  in  which  full  spectral  samples  from  one  tile 
(derived  from  the  original  scene  or  a  decimated  subset)  were 
reconstructed  in  the  spectral  space  of  another  tile  using  the  lo¬ 
cally  linear  property  of  the  manifold.  The  same  set  of  transfor¬ 
mations  apply  equally  in  the  manifold  coordinate  and  the  full 
spectral  space.  By  reconstructing  enough  samples  where  there 
is  sufficient  data  in  each  tile  for  accurate  reconstruction,  a  coor¬ 
dinate  transformation  can  be  derived  using  the  pseudoinverse: 

P  ={M?Mi)~1MfM*  (1) 

where  Mi  is  the  matrix  of  manifold  coordinate  samples  from 
tile  Ti  and  M*  is  the  corresponding  set  of  coordinates  recon¬ 
structed  in  the  manifold  coordinate  system  of  tile  Tj.  When 
no  sufficiently  accurate  reconstruction  was  possible  to  a  pre¬ 
chosen  target  tile,  a  series  of  alignment  hops  was  used  between 
intermediate  tiles  possessing  common  features  of  source  and 
target  tiles. 

The  reconstruction  method  was  determined  to  be  the  most  ef¬ 
fective  of  the  manifold  alignment  strategies,  with  the  others  less 
effective,  primarily  because  of  sampling  limitations  that  result 
from  restrictions  on  tile  size  imposed  by  memory  limitations. 
In  the  next  section,  however,  we  incorporate  a  method  which 
allows  the  do  to  be  replaced  by  a  significantly  smaller  but  rep¬ 
resentative  matrix  that  mitigates  the  memory  burden.  The  re¬ 
duced  memory  requirements  of  the  modified  approach  allow 
for  several  of  these  alignment  strategies  to  be  used  more  prac¬ 
tically  or  in  combination,  although  because  of  limited  space 
we  only  demonstrate  the  advantages  of  the  improved  scaling 
for  the  manifold  alignment  strategy  based  on  the  reconstruction 
principle  of  Equation  1 .  In  addition,  to  lower  memory  require¬ 
ments,  the  method  described  in  the  next  Section  also  stream¬ 
lines  other  computational  issues  such  as  the  eigensolution  of  r, 
eliminating  iterative  eigensolvers  in  favor  of  more  reliable  ex¬ 
act  solvers  appropriate  to  smaller  matrices,  and  also  results  in 
fewer  geodesic  distances  calculations  (the  0(N2log(N)  Dijk¬ 
stra  calculation  is  now  replaced  with  an  0(LNlog(N ))  calcu¬ 
lation  with  L  «  N. 

With  the  new  method  described  in  the  next  Section,  another 
additional  benefit  is  that  the  probability  of  alignment  errors 
should  be  lower  since  the  size  of  each  tile  can  be  larger  and 
more  representative  of  the  scene.  This  will  help  to  eliminate  oc¬ 
casional  alignment  errors  that  appeared  originally  in  [4]  which 
resulted  from  incomplete  constraint  of  manifold  coordinates  be¬ 
tween  tiles,  stemming  from  the  limited  sampling  available  in 
each  tile.  However,  one  potential  challenge  with  larger  tiles  is 


that  more  constraints  must  be  satisfied  in  each  alignment  be¬ 
cause  of  the  greater  diversity  of  spectral  samples  represented  in 
each  tile;  this  potentially  requires  a  more  flexible  local  recon¬ 
struction  error  criterion  that  potentially  takes  account  of  other 
issues  such  as  sample  density.  A  full  discussion  of  the  latter 
will  be  taken  up  in  future  publications. 

II.  Improved  Scaling  for  Manifold  Coordinate 
Representations 


An  improvement  to  the  processing  speed  and  memory  re¬ 
quirements  associated  with  ISOMAP  was  described  in  [6].  The 
improved  method  chooses  a  set  of  “landmarks”  (L-ISOMAP) 
from  which  all  of  the  manifold  geodesic  distances  do  are  cal¬ 
culated.  This  forms  an  LxN  geodesic  distance  matrix  with 
L  «  N.  The  symmetric  submatrix  dx  of  distances  between 
landmarks  is  an  LxL  matrix  whose  eigenvalues  and  eigenvec¬ 
tors  form  the  basis  of  the  embedding  of  the  manifold  coordi¬ 
nates.  Note  that  so  long  as  the  sampled  landmarks  span  the 
space  of  the  embedded  manifold  coordinates,  the  landmark  dis¬ 
tances  are  sufficient  to  calculate  the  manifold  coordinate  sys¬ 
tem.  Note  also  that  the  eigenvector  and  eigenvalue  problem  of 
a  large  NxN  matrix  has  been  replaced  by  a  smaller  LxL  prob¬ 
lem.  As  before,  the  second  order  variation  in  dx  is  computed 

according  to:  rL  = - \HtSlH ,  where  ( SL)ij  =  (( dL)ij )2. 

For  r,  iterative  methods  [17]  were  used  to  extract  the  eigen- 
spectrum,  however,  with  the  LxL  matrix  rx,  more  reliable  ex¬ 
act  eigensolvers  can  be  employed.  Once  the  most  significant 
eigenvalues  and  eigenvectors  of  tl  have  been  determined,  the 
manifold  coordinates  of  the  remaining  non-landmark  samples 
can  be  computed  by  a  simple  linear  transformation  since  their 
distances  to  the  landmark  positions  are  all  known: 

M(x)  =  PL  *  ((A  -  A)  (2) 


where  M(x)  is  the  embedded  manifold  coordinate  of  spectral 
sample  x,  P  is  a  matrix  whose  ith  row  is: 


(pL)i  = 


V(v)i 


(3) 


2002  fall  collection  and  included  in  all  subsequent  collections. 
COMPASS  acquired  data  over  the  same  set  of  islands  in  fall 
2003.  In  2004,  the  Naval  Research  Laboratory’s  PHILLS  [8] 
began  acquiring  imagery  over  the  island  chain  twice  annually 
in  spring  and  summer. 


Fig.  2.  Virginia  Coast  Reserve  study  area  and  adjacent  mainland  (Northamp¬ 
ton  and  Accomack  counties)  shown  in  a  Landsat  TM  image  from  August  6, 
1999.  Red  boxes  outline  regions  where  our  airborne  hyperpspectral  imagery 
time  series  has  been  collected  between  2000-2005  by  a  variety  of  sensors  in¬ 
cluding  HyMAP,  PROBE2,  COMPASS,  and  PHILLS. 

One  focus  of  our  research  at  the  Virginia  Coast  Reserve  site 
has  been  the  devlopment  and  testing  of  algorithms  for  detailed 
species-level  mapping  of  coastal  land-cover  [2]  [3]  [1]  [4].  Us¬ 
ing  the  hyperspectral  time  series,  we  have  developed  fast  online 
methods  for  fusing  the  clasification  results  from  multi-temporal 
inputs,  for  example,  mapping  products  developed  for  different 
seasons  [3].  The  fusion  of  classifier  results  uses  smooth  esti¬ 
mated  measures  of  classifier  reliability  to  determine  a  final  cat¬ 
egory  at  each  pixel.  Extensive  ground  truth  data  has  been  col¬ 
lected  by  us  [2]  [3]  throughout  all  of  the  islands  in  the  chain,  in¬ 
cluding  both  in  situ  reflectance  data,  BRDF,  and  more  recently 
biophysical  data  including  biomass,  canopy  light  penetration, 
leaf  area  index  (LAI),  and  leaf  optical  measurements  [5]. 


where  ( vl)%  and  {\l)i  are  the  ith  eigenvector  and  eigenvec¬ 
tor  of  tl ,  A*  =  ELj{{{dL)LiLj)2)  is  the  mean  squared  dis¬ 
tance  from  the  ith  landmark  to  all  other  landmarks,  and  A ^  = 
((dL) Lij)2  is  the  squared  distance  from  sample  j  to  the  ith  land¬ 
mark. 

III.  Hyperspectral  Time  Series  and  Study  Site 

In  May  2000,  we  began  airborne  hyperspectral  data  acqui¬ 
sitions  over  a  subset  of  the  Virginia  barrier  islands,  collectively 
known  as  the  Virginia  Coast  Reserve  [19]  [10]  [13]  show  in 
Figure  2.  A  time  series  of  airborne  hyperspectral  images  has 
been  collected  over  the  region  outlined  in  boxes  in  Figure  2. 
Beginning  with  a  single  scene  over  Smith  Island,  VA  in  May 

2000  by  HyMAP,  acquisitions  have  continued  to  the  present 
day  and  have  included  scenes  covering  the  seven  islands  be¬ 
tween  Smith  and  Hog  Islands  inclusive  in  summer  and  fall  of 

2001  and  2002.  Parramore  Island  was  also  added  during  the 


IV.  Results 

Our  first  example  illustrates  the  advantage  of  L-ISOMAP  ap¬ 
plied  to  the  problem  of  single  tile  optimization.  Because  com¬ 
putational  and  memory  requirements  are  lower  for  L-ISOMAP, 
we  are  able  to  derive  the  manifold  coordinates  directly  for 
whole  cross-sections  of  the  scene  treated  as  a  single  tile;  typ¬ 
ically  this  means  an  increase  in  tile  size  by  more  than  an  order 
of  magnitude  when  compared  with  the  tile  size  used  in  [4]  (see 
Figure  3).  In  Figure  4,  we  show  an  example  derived  from  one 
of  these  enlarged  tiles  taken  from  the  northern  end  of  the  Smith 
Island  PROBE2  hyperspectral  scene  originally  depicted  in  Fig¬ 
ure  4.  Also  shown  are  example  manifold  coordinates  (2-3-4) 
derived  using  the  L-ISOMAP  processing.  This  end  of  Smith 
Island  is  dominated  by  salt  marsh,  dunes,  beach,  and  tidal  es¬ 
tuaries  and  channels  connecting  to  the  Atlantic  Ocean.  A  rich 
structure  related  to  species  distribution  and  perhaps  biophysi¬ 
cal  parameters  is  delineated  in  the  manifold  coordinates.  In  our 


Fig.  3.  RGB  image  derived  from  PROBE2  HSI  scene  of  Smith  Island,  VA  ac¬ 
quired  October  18,  2001,  and  typical  tile  size  used  in  our  previously  published 
alignment  of  tile  HSI  manifold  coordinates,  showing  an  N=(75x75)  pixel  tile 
size.  Note  that  because  second  order  geodesic  distance  matrix  r  is  0(N2)  a 
machine  memory  limit  of  1GB  would  impose  a  tile  size  limit  of  N=(  105x1 05). 
Also  shown,  typical  tile  size  made  possible  by  landmarks  processing  which 
has  memory  requirements  of  0(LN )  with  L  «  N. 


Fig.  4.  (Top)  Subset  of  PROBE2  hyperspectral  scene  shown  in  Figure  3: 
a  cross-section  of  the  northern  end  of  Smith  Island,  (displayed  wavelengths: 
0.65,  0.55,  0.45  pm).  (Bottom)  Manifold  coordinates  2-3-4  obtained  using 
L-ISOMAP,  showing  extensive  structure  in  marsh  zones  and  shallow  water. 


second  example  (Figure  5),  we  portray  the  alignment  of  two  of 
these  enlarged  tiles  for  another  scene  of  Hog  Island,  VA  taken 
by  PROBE2,  also  on  October  18,  2001.  The  Figure  shows  a 
section  of  the  northern  end  of  Hog  Island,  with  a  diverse  cross- 
section  of  the  island  portrayed,  including  salt  marsh,  upland 
zones,  brackish  and  fresh  water  marsh  species,  dune  environ¬ 
ment,  and  beaches.  This  island  subset  was  partitioned  into  two 
different  cross-island  swaths  (tiles);  manifold  coordinates  were 
obtained  using  L-ISOMAP  and  then  the  manifold  coordinates 
were  aligned  using  the  method  described  in  [4] . 
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